命名实体识别(NER)任务旨在识别属于人,位置,组织等预定语义类型的文本中的实体。平面实体的最新解决方案NER通常因捕获捕获基础文本中的细粒语义信息。现有的基于跨度的方法克服了这一限制,但是计算时间仍然是一个问题。在这项工作中,我们提出了一个基于跨度的新型NER框架,即全球指针(GP),该框架通过乘法注意机制来利用相对位置。最终目标是实现一个全球观点,以考虑开始和最终位置以预测实体。为此,我们设计了两个模块来识别给定实体的头部和尾部,以使训练和推理过程之间的不一致。此外,我们引入了一种新型的分类损失函数,以解决不平衡标签问题。在参数方面,我们引入了一种简单但有效的近似方法来减少训练参数。我们在各种基准数据集上广泛评估GP。我们的广泛实验表明,GP可以胜过现有的解决方案。此外,实验结果表明,与软马克斯和熵替代方案相比,引入的损失函数的功效。
translated by 谷歌翻译
在深度学习时代,损失功能决定了模型和算法可用的任务范围。为了支持深度学习在多标签分类(MLC)任务中的应用,我们建议在本文中ZLPR(基于零结合的log-sum-exp \&成对级别)损失。与MLC的其他基于等级的损失相比,ZLPR可以治疗目标标签数量不确定的问题,在这种角度,这使其与MLC经常使用的其他两种策略同样能够,即二进制相关性(即二进制相关性)( BR)和标签Powerset(LP)。此外,ZLPR考虑了标签之间的加成,这使其比BR方法更全面。就计算复杂性而言,ZLPR可以与BR方法竞争,因为其预测也与标签无关,这使得与LP方法相比,时间和内存所需的时间和内存少。我们的实验证明了ZLPR对多个基准数据集和多个评估指标的有效性。此外,我们提出了ZLPR的软版本和相应的KL-Diverengency计算方法,这使得可以应用一些正则化技巧,例如标签平滑,以增强模型的概括。
translated by 谷歌翻译
SoftMax函数广泛用于人工神经网络,用于多级分类问题,其中SoftMax变换强制执行输出为正和总和,并且相应的损耗功能允许使用最大似然原理来优化模型。然而,Softmax留下了大幅的损失函数,以便在高维分类方面进行优化操作,这导致在一定程度上的低性能。在本文中,我们提供了一种对简单简洁的软制态变体,即稀疏-Softmax的实证研究,以减轻在高维分类问题方面的传统软邮件中发生的问题。我们在几个跨学科任务中评估了我们的方法,实验结果表明,Soparse-SoftMax更简单,更快,并产生比基线模型更好的结果。
translated by 谷歌翻译
最近编码的位置已显示在变压器体系结构中有效。它为序列不同位置的元素之间的依赖性建模提供了宝贵的监督。在本文中,我们首先研究了各种方法,以将位置信息整合到基于变压器的语言模型的学习过程中。然后,我们提出了一种名为旋转位置嵌入(绳索)的新颖方法,以有效利用位置信息。具体而言,提议的绳索用旋转矩阵编码绝对位置,同时将显式相对位置依赖性在自我发项公式中。值得注意的是,绳索具有宝贵的特性,包括序列长度的灵活性,衰减的相互依赖性随着相对距离的增加以及将线性自我注意力配备相对位置编码的能力。最后,我们在各种长文本分类基准数据集上使用旋转位置嵌入(也称为Roformer)评估增强的变压器。我们的实验表明,它始终如一地克服了其替代方案。此外,我们提供了理论分析来解释一些实验结果。 Roformer已经集成到HuggingFace:\ url {https://huggingface.co/docs/transformers/model_doc/roformer}。
translated by 谷歌翻译
Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies.
translated by 谷歌翻译
While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this problem and propose a sequential methodology for the detection of active EVs, using an off-the-shelf CNN model operating at a frame level and a downstream smoother that accounts for the temporal aspect of flashing EV lights. We also explore model improvements through data augmentation and training with additional hard samples.
translated by 谷歌翻译
A key feature of federated learning (FL) is to preserve the data privacy of end users. However, there still exist potential privacy leakage in exchanging gradients under FL. As a result, recent research often explores the differential privacy (DP) approaches to add noises to the computing results to address privacy concerns with low overheads, which however degrade the model performance. In this paper, we strike the balance of data privacy and efficiency by utilizing the pervasive social connections between users. Specifically, we propose SCFL, a novel Social-aware Clustered Federated Learning scheme, where mutually trusted individuals can freely form a social cluster and aggregate their raw model updates (e.g., gradients) inside each cluster before uploading to the cloud for global aggregation. By mixing model updates in a social group, adversaries can only eavesdrop the social-layer combined results, but not the privacy of individuals. We unfold the design of SCFL in three steps. \emph{i) Stable social cluster formation. Considering users' heterogeneous training samples and data distributions, we formulate the optimal social cluster formation problem as a federation game and devise a fair revenue allocation mechanism to resist free-riders. ii) Differentiated trust-privacy mapping}. For the clusters with low mutual trust, we design a customizable privacy preservation mechanism to adaptively sanitize participants' model updates depending on social trust degrees. iii) Distributed convergence}. A distributed two-sided matching algorithm is devised to attain an optimized disjoint partition with Nash-stable convergence. Experiments on Facebook network and MNIST/CIFAR-10 datasets validate that our SCFL can effectively enhance learning utility, improve user payoff, and enforce customizable privacy protection.
translated by 谷歌翻译
Transformer-based models have been widely demonstrated to be successful in computer vision tasks by modelling long-range dependencies and capturing global representations. However, they are often dominated by features of large patterns leading to the loss of local details (e.g., boundaries and small objects), which are critical in medical image segmentation. To alleviate this problem, we propose a Dual-Aggregation Transformer Network called DuAT, which is characterized by two innovative designs, namely, the Global-to-Local Spatial Aggregation (GLSA) and Selective Boundary Aggregation (SBA) modules. The GLSA has the ability to aggregate and represent both global and local spatial features, which are beneficial for locating large and small objects, respectively. The SBA module is used to aggregate the boundary characteristic from low-level features and semantic information from high-level features for better preserving boundary details and locating the re-calibration objects. Extensive experiments in six benchmark datasets demonstrate that our proposed model outperforms state-of-the-art methods in the segmentation of skin lesion images, and polyps in colonoscopy images. In addition, our approach is more robust than existing methods in various challenging situations such as small object segmentation and ambiguous object boundaries.
translated by 谷歌翻译
Users' involvement in creating and propagating news is a vital aspect of fake news detection in online social networks. Intuitively, credible users are more likely to share trustworthy news, while untrusted users have a higher probability of spreading untrustworthy news. In this paper, we construct a dual-layer graph (i.e., the news layer and the user layer) to extract multiple relations of news and users in social networks to derive rich information for detecting fake news. Based on the dual-layer graph, we propose a fake news detection model named Us-DeFake. It learns the propagation features of news in the news layer and the interaction features of users in the user layer. Through the inter-layer in the graph, Us-DeFake fuses the user signals that contain credibility information into the news features, to provide distinctive user-aware embeddings of news for fake news detection. The training process conducts on multiple dual-layer subgraphs obtained by a graph sampler to scale Us-DeFake in large scale social networks. Extensive experiments on real-world datasets illustrate the superiority of Us-DeFake which outperforms all baselines, and the users' credibility signals learned by interaction relation can notably improve the performance of our model.
translated by 谷歌翻译
Task-oriented dialogue systems often assist users with personal or confidential matters. For this reason, the developers of such a system are generally prohibited from observing actual usage. So how can they know where the system is failing and needs more training data or new functionality? In this work, we study ways in which realistic user utterances can be generated synthetically, to help increase the linguistic and functional coverage of the system, without compromising the privacy of actual users. To this end, we propose a two-stage Differentially Private (DP) generation method which first generates latent semantic parses, and then generates utterances based on the parses. Our proposed approach improves MAUVE by 3.8$\times$ and parse tree node-type overlap by 1.4$\times$ relative to current approaches for private synthetic data generation, improving both on fluency and semantic coverage. We further validate our approach on a realistic domain adaptation task of adding new functionality from private user data to a semantic parser, and show gains of 1.3$\times$ on its accuracy with the new feature.
translated by 谷歌翻译